Gender identification in Russian written texts
نویسندگان
چکیده
منابع مشابه
Gender Identification in Russian Texts
Gender Identification is a task where we have to identify the gender of the author for written texts. An hybrid approach has been designed by combining deep neural network and a rule-based classifier for russian texts. LSTM and BiLSTM have been used as a part of Neural Network due to their capability to learn long-term dependencies.
متن کاملAutomatically Categorizing Written Texts by Author Gender
The problem of automatically determining the gender of a document's author would appear to be a more subtle problem than those of categorization by topic or authorship attribution. Nevertheless, it is shown that automated text categorization techniques can exploit combinations of simple lexical and syntactic features to infer the gender of the author of an unseen formal written document with ap...
متن کاملGender, Genre, and Writing Style in Formal Written Texts
This paper explores differences between male and female writing in a large subset of the British National Corpus covering a range of genres. Several classes of simple lexical and syntactic features that differ substantially according to author gender are identified, both in fiction and in non-fiction documents. In particular, we find significant differences between maleand female-authored docum...
متن کاملCross-genre Gender Identification in Russian Texts Using Topic Modeling Working Note: Team DUBL
In this paper, we describe the results of gender identification from Team DUBL. We used a topic modeling approach for identifying the author’s gender based on his/her written texts. The model was trained on the RusProfiling PAN 2017 Twitter Corpus that contains data in the Russian language. Themodel has been evaluated on texts of other genres, including texts such as letters to a friend, online...
متن کاملAutomatic Language Identification from Written Texts – An Overview
Language Identification is the task of automatically identifying the language(s) in which the content is written in a document (web page, text document). Due to the widespread use of internet, identification of languages has become an important preprocessing step for a number of applications such as machine translation, Part-of-Speech tagging, linguistic corpus creation, supporting low-density ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: XLinguae
سال: 2017
ISSN: 1337-8384,2453-711X
DOI: 10.18355/xl.2017.10.03.14